Goto

Collaborating Authors

 online normalization





Reviews: Online Normalization for Training Neural Networks

Neural Information Processing Systems

The paper is well motivated and quite clear. I like the distinction between statistical, functional and heuristics methods of normalization. Also, investigating normalization techniques that do not rely on mini-batch statistics is an important research direction. I have however a few remarks concerning ON: 1) How does it compares to Batch Renormalization (BRN)? Both methods rely on running averages of statistics, so I think it would be fair to clearly state what are the differences between the two methods and to thoroughly compare against it in the experimental setup, especially because BRN introduces 1 extra hyper-parameter, while one need to tune 2 of them in ON. 2) How difficult is it to tune both decay rates hyper-parameters?


Reviews: Online Normalization for Training Neural Networks

Neural Information Processing Systems

The authors propose a new normalization technique for training deep networks called online normalization, as an alternative to batch normalization, providing both theoretical analyses and experimental results for the proposed approach. The topic is likely to be of broad interest to the NeurIPS audience given the prevalence of batch normalization in deep learning. All four reviewers found significant merit in the ideas in the paper - but they also had a number of specific technical questions (e.g., by R1 and R4) that should be addressed in the final version of the paper.


Online Normalization for Training Neural Networks

Neural Information Processing Systems

Online Normalization is a new technique for normalizing the hidden activations of a neural network. While Online Normalization does not use batches, it is as accurate as Batch Normalization. We resolve a theoretical limitation of Batch Normalization by introducing an unbiased technique for computing the gradient of normalized activations. Online Normalization works with automatic differentiation by adding statistical normalization as a primitive. This technique can be used in cases not covered by some other normalizers, such as recurrent networks, fully connected networks, and networks with activation memory requirements prohibitive for batching.


Online Normalization for Training Neural Networks

Chiley, Vitaliy, Sharapov, Ilya, Kosson, Atli, Koster, Urs, Reece, Ryan, Fuente, Sofia Samaniego de la, Subbiah, Vishal, James, Michael

Neural Information Processing Systems

Online Normalization is a new technique for normalizing the hidden activations of a neural network. While Online Normalization does not use batches, it is as accurate as Batch Normalization. We resolve a theoretical limitation of Batch Normalization by introducing an unbiased technique for computing the gradient of normalized activations. Online Normalization works with automatic differentiation by adding statistical normalization as a primitive. This technique can be used in cases not covered by some other normalizers, such as recurrent networks, fully connected networks, and networks with activation memory requirements prohibitive for batching.


What Are The Alternatives To Batch Normalization In Deep Learning?

#artificialintelligence

In the original BatchNorm paper, the authors Sergey Ioffe and Christian Szegedy of Google introduced a method to address a phenomenon called internal covariate shift. This occurs because the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialisation. This makes training the models harder. The introduction of batch normalized networks helped achieve state-of-the-art accuracies with 14 times fewer training steps.


Online Normalization for Training Neural Networks

Chiley, Vitaliy, Sharapov, Ilya, Kosson, Atli, Koster, Urs, Reece, Ryan, de la Fuente, Sofia Samaniego, Subbiah, Vishal, James, Michael

arXiv.org Machine Learning

Online Normalization is a new technique for normalizing the hidden activations of a neural network. Like Batch Normalization, it normalizes the sample dimension. While Online Normalization does not use batches, it is as accurate as Batch Normalization. We resolve a theoretical limitation of Batch Normalization by introducing an unbiased technique for computing the gradient of normalized activations. Online Normalization works with automatic differentiation by adding statistical normalization as a primitive. This technique can be used in cases not covered by some other normalizers, such as recurrent networks, fully connected networks, and networks with activation memory requirements prohibitive for batching. We show its applications to image classification, image segmentation, and language modeling. We present formal proofs and experimental results on ImageNet, CIFAR, and PTB datasets.